-
Notifications
You must be signed in to change notification settings - Fork 193
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for AWS EBS Snapshot Lock mode #1005
Add support for AWS EBS Snapshot Lock mode #1005
Conversation
6be293b
to
941d6af
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a great feature that we were waiting for! Already tested it in an AWS EC2 instance with barman installed and it is working as expected!
ping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, thank you for your contribution! I’ve tested your code, and it’s functioning correctly. However, I did come across a few issues, particularly regarding the usage with a Barman server for managing backups (not the barman-cloud-backup script). Please review my comments and suggestions, make the necessary changes, and then we can move forward from there. @gustabowill will be helping as well!
if int_value < 1 or int_value > 36500: | ||
raise ArgumentTypeError("'%s' is outside supported range of 1-36500 days" % value) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if int_value < 1 or int_value > 36500: | |
raise ArgumentTypeError("'%s' is outside supported range of 1-36500 days" % value) | |
if not 1 <= int_value <= 36500: | |
raise ArgumentTypeError(f"Value must be between 1 and 36,500 days.") | |
This way is more efficient because the value is compared only once in a single operation and It directly expresses the idea that the value should fall within a specific range.
if int_value < 1 or int_value > 72: | ||
raise ArgumentTypeError("'%s' is outside supported range of 1-72 hours" % value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if int_value < 1 or int_value > 72: | |
raise ArgumentTypeError("'%s' is outside supported range of 1-72 hours" % value) | |
if not 1 <= int_value <= 72: | |
raise ArgumentTypeError("Value must be between 1 and 72 hours.") |
same here.
@@ -783,6 +821,16 @@ def check_size(value): | |||
return int_value | |||
|
|||
|
|||
def check_timestamp(value): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def check_timestamp(value): | |
def check_aws_expiration_date_format(value): |
More specific name to the job.
@@ -783,6 +821,16 @@ def check_size(value): | |||
return int_value | |||
|
|||
|
|||
def check_timestamp(value): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, add docstrings like the other aws checks
@@ -1761,6 +1761,7 @@ def test_help_output(self, minimal_parser, capsys): | |||
if sys.version_info < (3, 10): | |||
options_label = "optional arguments" | |||
expected_output = self._expected_help_output.format(options_label=options_label) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
revert or remove this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def parse_datetime(value): | ||
""" | ||
Parse a string, transforming it in a datetime object. | ||
Accepted format: YYYY-MM-DDThh:mm:ss.sssZ | ||
|
||
:param str value: the string to evaluate | ||
""" | ||
|
||
return datetime.datetime.strptime(value, "%Y-%m-%dT%H:%M:%S.%fZ") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a little confusing in the sense that it is parsing a string to datetime but it is constraining it to a specific format, which is in fact just for AWS in this case.
def parse_datetime(value): | |
""" | |
Parse a string, transforming it in a datetime object. | |
Accepted format: YYYY-MM-DDThh:mm:ss.sssZ | |
:param str value: the string to evaluate | |
""" | |
return datetime.datetime.strptime(value, "%Y-%m-%dT%H:%M:%S.%fZ") | |
def parse_aws_datetime(value): | |
""" | |
Parse a string, transforming it in a datetime object. | |
Accepted format: YYYY-MM-DDThh:mm:ss.sssZ | |
:param str value: the string to evaluate | |
""" | |
return check_aws_expiration_date_format |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any ideas here @gustabowill? Maybe in the future we could re-think this if we have other formats.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can keep it as it is for now as this date format is very common.
I think you mistyped the return statement of your suggestion @andremagui.
) | ||
|
||
def _lock_snapshots(self, snapshots, lock_mode, lock_duration, lock_cool_off_period, lock_expiration_date): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be revisited and refactored.
- This should lock just one snapshot.
- The values should not be checked but passed to the the client method call. The checks will be done higher in the code flow, so they will come to this part as they should be. Also, pay attention that if this fails somehow, the backup will also fail. The snapshot will be created though.
- The response of the request to lock the snapshot has all the information you need to output information. Do not manipulate yourself the values in the response object, because if you use
LockDuration
ORLockExpiresOn
, each one has a different response. If the field is not present in the response object, the lock is successful but the backup fails. - Use
output.info()
instead oflogging.info()
. This can be importedfrom barman import output
.
To summarize, try to use this method to lock one snapshot, pass all the arguments, lock the snapshot and output information dynamically from the response object. In line 768, where we have for volume_identifier, volume_metadata in volumes.items():
, you should lock snapshot as they are created.
Also, you should add a mutually exclusive test in the beginning of the backup process for fields that cannot be passed together, like duration and expiration_date OR governance mode with cool_off period. You did it for the client, but its not being checked when using this with a barman server
I can give you two examples of response object that i have from my tests:
LockDuration
{'SnapshotId': 'snap-00adc6c28f422e98c', 'LockState': 'governance', 'LockDuration': 1, 'LockCreatedOn': datetime.datetime(2024, 10, 22, 17, 0, 49, 253000, tzinfo=tzlocal()), 'LockExpiresOn': datetime.datetime(2024, 10, 23, 17, 0, 49, 253000, tzinfo=tzlocal()), 'LockDurationStartTime': datetime.datetime(2024, 10, 22, 17, 0, 49, 253000, tzinfo=tzlocal()), 'ResponseMetadata': {'RequestId': '3f8c3e22-d1b2b2da0', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': '3f8c3e22-d1b2-b2da0', 'cache-control': 'no-cache, no-store', 'strict-transport-security': 'max-age=31536000; includeSubDomains', 'vary': 'accept-encoding', 'content-type': 'text/xml;charset=UTF-8', 'transfer-encoding': 'chunked', 'date': 'Tue, 22 Oct 2024 17:00:49 GMT', 'server': 'AmazonEC2'}, 'RetryAttempts': 0}}
LockExpiresOn
{'SnapshotId': 'snap-0d83456985415270c', 'LockState': 'compliance-cooloff', 'CoolOffPeriod': 3, 'CoolOffPeriodExpiresOn': datetime.datetime(2024, 10, 22, 19, 55, 3, 461000, tzinfo=tzlocal()), 'LockCreatedOn': datetime.datetime(2024, 10, 22, 16, 55, 3, 461000, tzinfo=tzlocal()), 'LockExpiresOn': datetime.datetime(2024, 10, 24, 15, 53, 35, 457000, tzinfo=tzlocal()), 'LockDurationStartTime': datetime.datetime(2024, 10, 22, 16, 55, 3, 461000, tzinfo=tzlocal()), 'ResponseMetadata': {'RequestId': 'bac96fa3-008405f8d79', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': 'bac969e2e-6808405f8d79', 'cache-control': 'no-cache, no-store', 'strict-transport-security': 'max-age=31536000; includeSubDomains', 'vary': 'accept-encoding', 'content-type': 'text/xml;charset=UTF-8', 'transfer-encoding': 'chunked', 'date': 'Tue, 22 Oct 2024 16:55:02 GMT', 'server': 'AmazonEC2'}, 'RetryAttempts': 0}}
@@ -761,10 +781,12 @@ def take_snapshot_backup(self, backup_info, instance_identifier, volumes): | |||
snapshot_name, snapshot_resp = self._create_snapshot( | |||
backup_info, volume_identifier, volume_metadata.id | |||
) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
parsed_args = parser.parse_args(args=args) | ||
|
||
# Perform mutual exclusivity check | ||
if parsed_args.aws_snapshot_lock_duration is not None and parsed_args.aws_snapshot_lock_expiration_date is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move and refactor this in the _validate_config
function in this module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As this is only about mutually exclusiveness, we can also leverage of argparse
to handle that for us. Something like this:
s3_lock_target_group = s3_arguments.add_mutually_exclusive_group()
s3_lock_target_group.add_argument(
"--aws-snapshot-lock-duration",
help="The duration (in days) for which the snapshot should be locked. Must be between 1 and 36500. To lock a snapshopt, you must specify either this argument or --aws-snapshot-lock-expiration-date, but not both.",
type=check_aws_snapshot_lock_duration_range,
)
s3_lock_target_group.add_argument(
"--aws-snapshot-lock-expiration-date",
help="The expiration date for a locked snapshot in the format YYYY-MM-DDThh:mm:ss.sssZ. To lock a snapshot, you must specify either this argument or --aws-snapshot-lock-duration, but not both.",
type=check_timestamp
)
If you attempt to specify both, argparse
would throw an error:
barman-cloud-backup: error: argument --aws-snapshot-lock-expiration-date: not allowed with argument --aws-snapshot-lock-duration
logging.info( | ||
"Snapshot %s locked in state '%s' (lock duration: %s days, cool-off period: %s hours, " | ||
"cool-off period expires on: %s, lock expires on: %s, lock duration time: %s)", | ||
snapshot.identifier, | ||
resp["LockState"], | ||
resp["LockDuration"], | ||
resp["CoolOffPeriod"], | ||
resp["CoolOffPeriodExpiresOn"], | ||
resp["LockExpiresOn"], | ||
resp["LockDurationStartTime"] | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In accordance with what @andremagui said, I think this is the biggest problem here. It seems the response from AWS is not always the same depending on the request parameters you pass, so you should ideally not rely on their response for logging.
E.g. if i run barman-cloud-backup
with --aws-snapshot-lock-mode governance --aws-snapshot-lock-duration 1
i get:
2024-10-23 14:14:14,636 [3122] ERROR: Backup failed uploading data ('CoolOffPeriod')
Because CoolOffPeriod
is not present in the response:
{'SnapshotId': 'snap-0a9e1d413ef5a9254', 'LockState': 'governance', 'LockDuration': 1, 'LockCreatedOn': datetime.datetime(2024, 10, 23, 14, 14, 14, 594000, tzinfo=tzlocal()), 'LockExpiresOn': datetime.datetime(2024, 10, 24, 14, 14, 14, 594000, tzinfo=tzlocal()), 'LockDurationStartTime': datetime.datetime(2024, 10, 23, 14, 14, 14, 594000, tzinfo=tzlocal()), 'ResponseMetadata': {'RequestId': '91c97b91-9dde-4d38-a652-0607ff9574f8', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': '91c97b91-9dde-4d38-a652-0607ff9574f8', 'cache-control': 'no-cache, no-store', 'strict-transport-security': 'max-age=31536000; includeSubDomains', 'vary': 'accept-encoding', 'content-type': 'text/xml;charset=UTF-8', 'transfer-encoding': 'chunked', 'date': 'Wed, 23 Oct 2024 14:14:14 GMT', 'server': 'AmazonEC2'}, 'RetryAttempts': 0}}
The snapshot itself was created and locked but the process errored out and the backup is not completed successfully in Barman.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to use a formatted message, we should at least use resp.get
to return default values when the keys are not present.
if snapshot.snapshot_lock_mode is not None: | ||
resp = self.ec2_client.describe_locked_snapshots( | ||
SnapshotIds=[snapshot.identifier], | ||
) | ||
|
||
if resp["Snapshots"] and resp["Snapshots"][0]["LockState"] != "expired": | ||
logging.warning("Skipping deletion of snapshot %s as it not expired yet", snapshot.identifier) | ||
continue | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has do be done earlier in the code. By doing it here you are only avoiding the deletion of the snapshot itself, but the backup metadata in the s3 bucket is still deleted. This would make Barman lose track of the backup while the snapshot itself will still exist.
E.g.
- I have a backup
20241023T135554
in my catalog:
$ barman-cloud-backup-list s3://barman-testing-snapshot-tags pg-server
Backup ID End Time Begin Wal Archival Status Name
20241023T135554 2024-10-23 13:56:10 000000010000000000000024
- I try to delete it and receive your message saying it was skipped.
$ barman-cloud-backup-delete s3://barman-testing-snapshot-tags pg-server -b 20241023T135554
2024-10-23 13:57:17,634 [2735] WARNING: Skipping deletion of snapshot snap-03c42da5c8a2f2ce7 as it not expired yet
- However, its metadata was deleted from s3, so Barman is no longer aware of it.
$ barman-cloud-backup-list s3://barman-testing-snapshot-tags pg-server
Backup ID End Time Begin Wal Archival Status Name
$ barman-cloud-backup-delete s3://barman-testing-snapshot-tags pg-server -b 20241023T135554
2024-10-23 13:57:34,720 [2736] WARNING: Backup 20241023T135554 does not exist
I believe you would have to handle this in cloud_backup_delete.py
in _delete_backup
. Also in backup.py
in delete_backup_data
for when running it through barman standard server instead of cloud-barman-backup-delete
script.
def parse_datetime(value): | ||
""" | ||
Parse a string, transforming it in a datetime object. | ||
Accepted format: YYYY-MM-DDThh:mm:ss.sssZ | ||
|
||
:param str value: the string to evaluate | ||
""" | ||
|
||
return datetime.datetime.strptime(value, "%Y-%m-%dT%H:%M:%S.%fZ") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can keep it as it is for now as this date format is very common.
I think you mistyped the return statement of your suggestion @andremagui.
@ruimarinho we have provided some feedback, suggestions and patches to this PR, are you interested in bringing this forward yourself? |
parsed_args = parser.parse_args(args=args) | ||
|
||
# Perform mutual exclusivity check | ||
if parsed_args.aws_snapshot_lock_duration is not None and parsed_args.aws_snapshot_lock_expiration_date is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As this is only about mutually exclusiveness, we can also leverage of argparse
to handle that for us. Something like this:
s3_lock_target_group = s3_arguments.add_mutually_exclusive_group()
s3_lock_target_group.add_argument(
"--aws-snapshot-lock-duration",
help="The duration (in days) for which the snapshot should be locked. Must be between 1 and 36500. To lock a snapshopt, you must specify either this argument or --aws-snapshot-lock-expiration-date, but not both.",
type=check_aws_snapshot_lock_duration_range,
)
s3_lock_target_group.add_argument(
"--aws-snapshot-lock-expiration-date",
help="The expiration date for a locked snapshot in the format YYYY-MM-DDThh:mm:ss.sssZ. To lock a snapshot, you must specify either this argument or --aws-snapshot-lock-duration, but not both.",
type=check_timestamp
)
If you attempt to specify both, argparse
would throw an error:
barman-cloud-backup: error: argument --aws-snapshot-lock-expiration-date: not allowed with argument --aws-snapshot-lock-duration
@@ -761,10 +781,12 @@ def take_snapshot_backup(self, backup_info, instance_identifier, volumes): | |||
snapshot_name, snapshot_resp = self._create_snapshot( | |||
backup_info, volume_identifier, volume_metadata.id | |||
) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be locked. | ||
:param int lock_cool_off_period: The cool-off period (in hours) for the snapshot. | ||
:param str lock_expiration_date: The expiration date for the snapshot in the format | ||
YYYY-MM-DDThh:mm:ss.sssZ. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
YYYY-MM-DDThh:mm:ss.sssZ. | |
``YYYY-MM-DDThh:mm:ss.sssZ``. |
Sphinx recommendation for literals.
logging.info( | ||
"Snapshot %s locked in state '%s' (lock duration: %s days, cool-off period: %s hours, " | ||
"cool-off period expires on: %s, lock expires on: %s, lock duration time: %s)", | ||
snapshot.identifier, | ||
resp["LockState"], | ||
resp["LockDuration"], | ||
resp["CoolOffPeriod"], | ||
resp["CoolOffPeriodExpiresOn"], | ||
resp["LockExpiresOn"], | ||
resp["LockDurationStartTime"] | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to use a formatted message, we should at least use resp.get
to return default values when the keys are not present.
@@ -1027,12 +1103,14 @@ def __init__( | |||
:param str device_name: The device name used in the AWS API. | |||
:param str snapshot_id: The snapshot ID used in the AWS API. | |||
:param str snapshot_name: The snapshot name stored in the `Name` tag. | |||
:param str snapshot_lock_mode: The mode with which the snapshot has been locked, if set. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:param str snapshot_lock_mode: The mode with which the snapshot has been locked, if set. | |
:param str snapshot_lock_mode: The mode with which the snapshot has been locked (``governance`` or ``compliance``), if set. |
Parse a string, transforming it in a datetime object. | ||
Accepted format: YYYY-MM-DDThh:mm:ss.sssZ | ||
|
||
:param str value: the string to evaluate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Parse a string, transforming it in a datetime object. | |
Accepted format: YYYY-MM-DDThh:mm:ss.sssZ | |
:param str value: the string to evaluate | |
Parse a string, transforming it in a datetime object. | |
Accepted format: ``YYYY-MM-DDThh:mm:ss.sssZ`` | |
:param str value: the string to evaluate | |
:return: *value* parsed as :class:`datetime.datetime`. |
"aws_snapshot_lock_duration": int, | ||
"aws_snapshot_lock_cool_off_period": int, | ||
"aws_snapshot_lock_expiration_date": parse_datetime, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could somehow leverage of the utils functions that you created for validating the CLI arguments of barman-cloud
, to validate all the new configuration options (including the lock mode).
@@ -1761,6 +1761,7 @@ def test_help_output(self, minimal_parser, capsys): | |||
if sys.version_info < (3, 10): | |||
options_label = "optional arguments" | |||
expected_output = self._expected_help_output.format(options_label=options_label) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello, we've had internal communication with Rui and decided the Barman team will finish the work here. Rui's commits will be used and have been a valuable source for finishing this feature which we see as very important. This said, we will close this PR, work internally (we have integration tests we can run) and make this available in the next release if nothing unexpected comes up. Thank you very much @ruimarinho |
This pull request introduces support for AWS EBS Snapshot Lock to protect Amazon EBS snapshots against accidental or malicious deletions, or to store them in WORM (Write-Once-Read-Many) format for a specific duration.
While a snapshot is locked, as long as it is out of the cool-off period, it can't be deleted by any user, regardless of their IAM permissions.
Overview of changes
--aws-snapshot-lock-mode
--aws-snapshot-lock-duration
--aws-snapshot-lock-cool-off-period
--aws-snapshot-lock-expiration-date
snapshot_lock_mode
to backup metadata json.expired
.If the approach looks good, I will start working on documentation next.
Closes #972.